01. Connect Python(Jupyter)

Python에서 Bigquery를 사용하는 방법은 1) gcloud의 bigquery를 사용하는 방법과 2) pandas 내장 gbq를 사용 하는 방법 2가지가 있습니다

gcloud의 bigquery 사용

gcloud sdk install을 따라서 설치해주세요! 그 후 아래와 같이 진행해주세요
```
(터미널에서)
pip3 install gcloud
gcloud auth login
```
그 후, GOOGLE_APPLICATION_CREDENTIALS을 설정해줍니다
gcloud auth로 하는 방법, API 사용자 인증 정보 - service_key.json을 사용하는 방법 등이 있습니다
json key file을 설정하는 방법은 링크를 클릭해주세요!



In [1]:

    
from gcloud import bigquery as bq
import uuid
import os

gcloud auth로 진행한 경우



In [2]:

    
client = bq.Client()

service_key.json을 사용하는 경우



In [4]:

    
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Path/to/service_key.json"



In [3]:

    
client = bq.Client()



In [ ]:

    
# client의 project명 print
print(client.project)



In [4]:

    
# dataset 설정
dataset = client.dataset("bigquery-public-data:samples")

async_query



In [5]:

    
query_job = client.run_async_query(str(uuid.uuid4()), """
        #standardSQL
        SELECT corpus AS title, COUNT(*) AS unique_words
        FROM `publicdata.samples.shakespeare`
        GROUP BY title
        ORDER BY unique_words DESC
        LIMIT 10""")



In [8]:

    
# query 실행
query_job.begin()



In [9]:

    
destination_table = query_job.destination
destination_table.reload()
for row in destination_table.fetch_data():
    print(row)









    



[('hamlet', 5318), ('kinghenryv', 5104), ('cymbeline', 4875), ('troilusandcressida', 4795), ('kinglear', 4784), ('kingrichardiii', 4713), ('2kinghenryvi', 4683), ('coriolanus', 4653), ('2kinghenryiv', 4605), ('antonyandcleopatra', 4582)]
10
None

sync_query



In [10]:

    
query_string = '''
        #standardSQL
        SELECT corpus AS title, COUNT(*) AS unique_words
        FROM `publicdata.samples.shakespeare`
        GROUP BY title
        ORDER BY unique_words DESC
        LIMIT 10
        '''



In [11]:

    
query = client.run_sync_query(query_string)
query.timeout_ms = 5 * 60 * 1000



In [12]:

    
query.run()



In [13]:

    
data = query.fetch_data()[0]



In [14]:

    
data









    Out[14]:





[('hamlet', 5318),
 ('kinghenryv', 5104),
 ('cymbeline', 4875),
 ('troilusandcressida', 4795),
 ('kinglear', 4784),
 ('kingrichardiii', 4713),
 ('2kinghenryvi', 4683),
 ('coriolanus', 4653),
 ('2kinghenryiv', 4605),
 ('antonyandcleopatra', 4582)]

2) pandas 내장 gbq 사용



In [6]:

    
from pandas.io import gbq



In [8]:

    
query = '''
        #standardSQL
        SELECT corpus AS title, COUNT(*) AS unique_words
        FROM `publicdata.samples.shakespeare`
        GROUP BY title
        ORDER BY unique_words DESC
        LIMIT 10
        '''



In [13]:

    
gbq.read_gbq(query = query, project_id = 'project_id', dialect='standard')









    



Requesting query... ok.
Job ID: job_-HhwJshSnAMGgO8VOow5U0vRCS4G
Query running...
Query done.
Processed: 2.4 MB
Standard price: $0.00 USD

Retrieving results...
Got 10 rows.

Total time taken 2.06 s.
Finished at 2017-10-03 21:15:04.






    Out[13]:







  
    
      
      title
      unique_words
    
  
  
    
      0
      hamlet
      5318
    
    
      1
      kinghenryv
      5104
    
    
      2
      cymbeline
      4875
    
    
      3
      troilusandcressida
      4795
    
    
      4
      kinglear
      4784
    
    
      5
      kingrichardiii
      4713
    
    
      6
      2kinghenryvi
      4683
    
    
      7
      coriolanus
      4653
    
    
      8
      2kinghenryiv
      4605
    
    
      9
      antonyandcleopatra
      4582

	title	unique_words
0	hamlet	5318
1	kinghenryv	5104
2	cymbeline	4875
3	troilusandcressida	4795
4	kinglear	4784
5	kingrichardiii	4713
6	2kinghenryvi	4683
7	coriolanus	4653
8	2kinghenryiv	4605
9	antonyandcleopatra	4582